19 research outputs found

    Spanish named entity recognition in the biomedical domain

    Get PDF
    Named Entity Recognition in the clinical domain and in languages different from English has the difficulty of the absence of complete dictionaries, the informality of texts, the polysemy of terms, the lack of accordance in the boundaries of an entity, the scarcity of corpora and of other resources available. We present a Named Entity Recognition method for poorly resourced languages. The method was tested with Spanish radiology reports and compared with a conditional random fields system.Peer ReviewedPostprint (author's final draft

    Arabic medical entity tagging using distant learning in a Multilingual Framework

    Get PDF
    AbstractA semantic tagger aiming to detect relevant entities in Arabic medical documents and tagging them with their appropriate semantic class is presented. The system takes profit of a Multilingual Framework covering four languages (Arabic, English, French, and Spanish), in a way that resources available for each language can be used to improve the results of the others, this is specially important for less resourced languages as Arabic. The approach has been evaluated against Wikipedia pages of the four languages belonging to the medical domain. The core of the system is the definition of a base tagset consisting of the three most represented classes in SNOMED-CT taxonomy and the learning of a binary classifier for each semantic category in the tagset and each language, using a distant learning approach over three widely used knowledge resources, namely Wikipedia, Dbpedia, and SNOMED-CT

    Semantic tagging and normalization of French medical entities

    Get PDF
    In this paper we present two tools for facing task 2 in CLEF eHealth 2016. The first one is a semantic tagger aiming to detect relevant entities in French medical documents, tagging them with their appropriate semantic class and normalizing them with the Semantic Groups codes defined in the UMLS. It is based on a distant learning approach that uses several SVM classifiers that are combined to give a single result. The second tool is based on a symbolic procedure to obtain the English translation of each medical term and looks for normalization information in public accessible resources.Peer ReviewedPostprint (published version

    Semantic tagging of French medical entities using distant learning

    Get PDF
    In this paper we present a semantic tagger aiming to detect relevant entities in French medical documents and tagging them with their appropriate semantic class. These experiments has been carried out in the framework of CLEF2015 eHealth contest that proposes a tagset of ten classes from UMLS taxonomy. The system presented uses a set of binary classifiers, and a combination mechanisms for combining the results of the classifiers. Learning the classifiers is performed using two widely used knowledge source, one domain restricted and the other is a domain independent resource.Peer ReviewedPostprint (published version

    Evolución de la enseñanza de la informática y las TIC en la Escuela Media en Argentina en los últimos 35 años

    Get PDF
    Nowadays, the lack of basic knowledge and competences on technology hinders the possibility of obtaining qualified jobs, embarking on higher studies and participating in society. For that reason, the teaching of computer science in secondary schools is of high interest. In a country, where a great need of graduates in informatics is expected, it is fundamental to have competent teachers that encourage students and to have public policies tending to make this growth possible. Working with informatics during secondary school could initiate students in useful concepts regarding computational thinking to be applied to any profession. This article describes events related with the teaching of informatics in Argentina, focusing on its development in secondary education.En la actualidad, la carencia de conocimientos básicos de tecnología dificulta la obtención de puestos de trabajo calificados, la habilitación para estudios posteriores y la participación en la sociedad. Es por esto que la enseñanza de la informática en las escuelas secundarias es de alto interés. En un país en el que se prevé una gran necesidad de graduados en carreras de informática en los próximos años la alta calidad docente, su capacidad de incentivo a los alumnos y la existencia de políticas públicas tendientes a posibilitar esto, son fundamentales.  El trabajo en informática durante el secundario podría iniciar a los estudiantes en conceptos del pensamiento computacional, útiles para ser aplicados en cualquier profesión. En este artículo se presentan sucesos relacionados con la presencia de la informática en la Argentina, poniéndose foco en su desarrollo en la enseñanza media

    Syntactic methods for negation detection in radiology reports in Spanish

    Get PDF
    Identification of the certainty of events is an important text mining problem. In particular, biomedical texts report medical conditions or findings that might be factual, hedged or negated. Identification of negation and its scope over a term of interest determines whether a finding is reported and is a challenging task. Not much work has been performed for Spanish in this domain. In this work we introduce different algorithms developed to determine if a term of interest is under the scope of negation in radiology reports written in Spanish. The methods include syntactic techniques based in rules derived from PoS tagging patterns, constituent tree patterns and dependency tree patterns, and an adaption of NegEx, a well known rule-based negation detection algorithm (Chapman et al., 2001a). All methods outperform a simple dictionary lookup algorithm developed as baseline. NegEx and the PoS tagging pattern method obtain the best results with 0.92 F1.Peer ReviewedPostprint (published version

    A study of Hate Speech in Social Media during the COVID-19 outbreak

    Get PDF
    In pandemic situations, hate speech propagates in social media, new forms of stigmatization arise and new groups are targeted with this kind of speech. In this short article, we present work in progress on the study of hate speech in Spanish tweets related to newspaper articles about the COVID-19 pandemic. We cover two main aspects: The construction of a new corpus annotated for hate speech in Spanish tweets, and the analysis of the collected data in order to answer questions from the social field, aided by modern computational tools. Definitions and progress are presented in both aspects. For the corpus, we introduce the data collection process, the annotation schema and criteria, and the data statement. For the analysis, we present our goals and its associated questions. We also describe the definition and training of a hate speech classifier, and present preliminary results using it.Fil: Cotik, Viviana. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales; Argentina.Fil: Debandi, Natalia. Universidad Nacional de Río Negro; Argentina.Fil: Luque, Franco. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía, Física y Computación; Argentina.Fil: Luque, Franco. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina.Fil: Miguel, Paula. Universidad de Buenos Aires; Argentina.Fil: Moro, Agustín. Universidad de Buenos Aires; Argentina.Fil: Moro, Agustín. Universidad Nacional del Centro; Argentina.Fil: Pérez, Juan Manuel. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales; Argentina.Fil: Serrati, Pablo. Universidad de Buenos Aires; Argentina.Fil: Zajac, Joaquín. Universidad de Buenos Aires; Argentina.Fil: Zayat, Demián. Universidad de Buenos Aires; Argentina

    Assessing the impact of contextual information in hate speech detection

    Get PDF
    In recent years, hate speech has gained great relevance in social networks and other virtual media because of its intensity and its relationship with violent acts against members of protected groups. Due to the great amount of content generated by users, great effort has been made in the research and development of automatic tools to aid the analysis and moderation of this speech, at least in its most threatening forms. One of the limitations of current approaches to automatic hate speech detection is the lack of context. Most studies and resources are performed on data without context; that is, isolated messages without any type of conversational context or the topic being discussed. This restricts the available information to define if a post on a social network is hateful or not. In this work, we provide a novel corpus for contextualized hate speech detection based on user responses to news posts from media outlets on Twitter. This corpus was collected in the Rioplatense dialectal variety of Spanish and focuses on hate speech associated with the COVID-19 pandemic. Classification experiments using state-of-the-art techniques show evidence that adding contextual information improves hate speech detection performance for two proposed tasks (binary and multi-label prediction). We make our code, models, and corpus available for further research

    Evolución de la enseñanza de la informática y las TIC en la Escuela Media en Argentina en los últimos 35 años

    No full text
    Nowadays, the lack of basic knowledge and competences on technology hinders the possibility of obtaining qualified jobs, embarking on higher studies and participating in society. For that reason, the teaching of computer science in secondary schools is of high interest. In a country, where a great need of graduates in informatics is expected, it is fundamental to have competent teachers that encourage students and to have public policies tending to make this growth possible. Working with informatics during secondary school could initiate students in useful concepts regarding computational thinking to be applied to any profession. This article describes events related with the teaching of informatics in Argentina, focusing on its development in secondary education.En la actualidad, la carencia de conocimientos básicos de tecnología dificulta la obtención de puestos de trabajo calificados, la habilitación para estudios posteriores y la participación en la sociedad. Es por esto que la enseñanza de la informática en las escuelas secundarias es de alto interés. En un país en el que se prevé una gran necesidad de graduados en carreras de informática en los próximos años la alta calidad docente, su capacidad de incentivo a los alumnos y la existencia de políticas públicas tendientes a posibilitar esto, son fundamentales.  El trabajo en informática durante el secundario podría iniciar a los estudiantes en conceptos del pensamiento computacional, útiles para ser aplicados en cualquier profesión. En este artículo se presentan sucesos relacionados con la presencia de la informática en la Argentina, poniéndose foco en su desarrollo en la enseñanza media
    corecore